What are we analyzing?
We aim to create an interactive correlation plot using
plotly that allows filtering to observe how correlations
change across different groups (e.g., species in the Iris dataset).
What the code does:
Loads the required libraries and the Iris dataset for analysis.
library(plotly)
library(dplyr)
data(iris)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
What the code does:
Creates a custom function to calculate regression lines and correlation
coefficients for subsets of data (based on species
regression_lines <- function(data, species) {
data_sp <- data %>% filter(Species == species)
model <- lm(Petal.Length ~ Sepal.Length, data = data_sp)
x_range <- seq(min(data_sp$Sepal.Length), max(data_sp$Sepal.Length), length.out = 100)
y_range <- predict(model, newdata = data.frame(Sepal.Length = x_range))
cor_value <- round(cor(data_sp$Sepal.Length, data_sp$Petal.Length), 2)
data.frame(Sepal.Length = x_range, Petal.Length = y_range, Species = species, Correlation = cor_value)
}
What the code does:
Generates regression lines and calculates correlation coefficients for
each species (Setosa, Versicolor, Virginica).
lines_setosa <- regression_lines(iris, "setosa")
lines_versicolor <- regression_lines(iris, "versicolor")
lines_virginica <- regression_lines(iris, "virginica")
What the code does:
Creates a scatter plot of Sepal.Length vs Petal.Length and includes a
filter for species.
fig <- plot_ly(
data = iris,
x = ~Sepal.Length,
y = ~Petal.Length,
color = ~Species,
colors = c('blue', 'orange', 'green'),
type = 'scatter',
mode = 'markers',
transforms = list(
list(
type = 'filter',
target = ~Species,
operation = '=',
value = "setosa"
)
)
)
What the code does:
Adds dynamic regression lines for each species to the scatter plot.
fig <- fig %>%
add_lines(
data = lines_setosa,
x = ~Sepal.Length,
y = ~Petal.Length,
line = list(color = 'blue', width = 1),
name = paste("Setosa (r =", lines_setosa$Correlation[1], ")")
) %>%
add_lines(
data = lines_versicolor,
x = ~Sepal.Length,
y = ~Petal.Length,
line = list(color = 'orange', width = 1),
name = paste("Versicolor (r =", lines_versicolor$Correlation[1], ")")
) %>%
add_lines(
data = lines_virginica,
x = ~Sepal.Length,
y = ~Petal.Length,
line = list(color = 'green', width = 1),
name = paste("Virginica (r =", lines_virginica$Correlation[1], ")")
)
What the code does:
Includes a dropdown menu to filter the plot by species or show all data
points together.
fig <- fig %>%
layout(
title = "Dynamic Lines with Correlation Coefficients",
xaxis = list(title = "Sepal Length (cm)"),
yaxis = list(title = "Petal Length (cm)"),
updatemenus = list(
list(
buttons = list(
list(
method = "restyle",
args = list("transforms[0].value", "setosa"),
label = "Iris-setosa"
),
list(
method = "restyle",
args = list("transforms[0].value", "versicolor"),
label = "Iris-versicolor"
),
list(
method = "restyle",
args = list("transforms[0].value", "virginica"),
label = "Iris-virginica"
),
list(
method = "restyle",
args = list("transforms[0].value", unique(iris$Species)),
label = "All"
)
),
direction = "down",
x = 0.1,
y = 1.15,
showactive = TRUE
)
)
)
fig
What the plot shows:
This interactive plot visualizes the relationship between
Sepal.Length and Petal.Length for the Iris
dataset, with regression lines and correlation coefficients for each
species. Key features include:
Setosa, Versicolor, or
Virginica) or view all species together.r). These lines dynamically adjust based on
the selected filter.This plot is an excellent tool for exploring both individual and combined group correlations, enabling deeper insights into the data structure.
Key Insights:
r values, provide
a clear understanding of the strength and direction of linear
relationships for each group.Recommendation:
This approach is ideal for datasets with well-defined groups (e.g.,
categories or classes). Use dynamic filtering to explore correlations
efficiently across subsets. The interactive legend and zoom features
further enhance the user experience, making it suitable for exploratory
data analysis and presentations to diverse audiences.